This is Assignment Two, Developing Data Products of the Data Science Specialization in Coursera. This document outlines how to create a simple plot using the plotly package in R.
For this assignment we’ll use greenhouse gas emissions data from the Greenhouse Gas Reporting Program in the Government of Canada website. The Greenhouse Gas Reporting Program (GHGRP) collects information on greenhouse gas (GHG) emissions annually from facilities across Canada. It is a mandatory program for those who meet the requirements. For more information, check out the website at [https://climate-change.canada.ca/facility-emissions]
Facilities that emit 50 kilotonnes or more of GHGs, in carbon dioxide (CO2) equivalent (eq.) units, per year must report their emissions to Environment and Climate Change Canada. For this assignment we’ll focus on the western provinces of Alberta and Britsh Columbia.
Note: Data expressed in CO2 eq. units use the most recently revised global warming potential (GWP) values used internationally for GHG reporting.
The data is read using read.csv function in R. The column names are cleaned and the french version of the name removed. Here we are just focused on the facility name, company name, and description. The greenhouse gas emissions are the Total Emissions by Tonnnes of CO2 equivalent.
# set url of the data
my.url <- "http://data.ec.gc.ca/data/substances/monitor/greenhouse-gas-reporting-program-ghgrp-facility-greenhouse-gas-ghg-data/PDGES-GHGRP-GHGEmissionsGES-2004-Present.csv"
# read data
ghg <- read.csv(my.url)
# get names
names <- colnames(ghg)
# clean column names (remove French version)
my.names <- if_else(is.na(str_extract(names,".*(?=\\.\\.\\.)")),names,str_extract(names,".*(?=\\.\\.\\.)"))
colnames(ghg) <- my.names
# Select rows of interest
my.ghg <- ghg %>% select(Facility.Province.or.Territory,
Reference.Year,
Total.Emissions..tonnes.CO2e.) %>%
mutate(Reference.Year = as.Date(Reference.Year,format = "%Y"),
Total.Emissions..tonnes.CO2e. =
round(if_else(Total.Emissions..tonnes.CO2e. == 0,1,Total.Emissions..tonnes.CO2e.),1)) %>%
filter(complete.cases(.))
sum.ghg <- my.ghg %>%
group_by(Facility.Province.or.Territory,Reference.Year) %>%
summarize(Sum.per.Province = sum(sum(Total.Emissions..tonnes.CO2e.)))
# view table
datatable(sum.ghg,options = list(scrollX=TRUE,pageLength = 5))
The plot below is a bar chart showing the total amounts of greenhouse gas emissions (tonnes of CO2 equivalent) for each Canadian provice for each of documented data.
p <- plot_ly(data = my.ghg,
x=~Reference.Year,
y=~Total.Emissions..tonnes.CO2e.,
type="bar",
name=~Facility.Province.or.Territory,
colors = "YlOrRd") %>%
layout(yaxis=list(title="Total greenhouse gas emissions (tonnes CO2 eq.)"),barmode="stack")
p